Skip to content

Conversation

@lkomali
Copy link
Contributor

@lkomali lkomali commented Jan 16, 2026

Docs for profiling with audio models

Summary by CodeRabbit

Release Notes

  • Documentation
    • Added new tutorial for profiling Audio Language Models with AIPerf, including setup instructions for vLLM servers, verification steps, and guidance on profiling with synthetic audio with or without text prompts. Covers configuration options for audio generation and example CLI commands.

✏️ Tip: You can customize this high-level summary in your review settings.

Signed-off-by: lkomali <lkomali@nvidia.com>
Signed-off-by: lkomali <lkomali@nvidia.com>
Signed-off-by: lkomali <lkomali@nvidia.com>
@github-actions
Copy link

github-actions bot commented Jan 16, 2026

Try out this PR

Quick install:

pip install --upgrade --force-reinstall git+https://github.com/ai-dynamo/aiperf.git@0fbc6b28483b70646a68da544f704aac766fb3a5

Recommended with virtual environment (using uv):

uv venv --python 3.12 && source .venv/bin/activate
uv pip install --upgrade --force-reinstall git+https://github.com/ai-dynamo/aiperf.git@0fbc6b28483b70646a68da544f704aac766fb3a5

Last updated for commit: 0fbc6b2Browse code

@github-actions github-actions bot added the feat label Jan 16, 2026
@coderabbitai
Copy link

coderabbitai bot commented Jan 16, 2026

Walkthrough

A new documentation tutorial is added explaining how to profile Audio Language Models using AIPerf with a vLLM-backed OpenAI-compatible chat endpoint. It covers vLLM server setup (direct and Docker), health verification, synthetic audio generation configuration options, and example CLI invocations for profiling workflows.

Changes

Cohort / File(s) Summary
Documentation: Audio Profiling Tutorial
docs/tutorials/audio.md
New tutorial documenting audio LLM profiling workflow with vLLM server setup (direct and Docker), health checks via chat completions, synthetic audio generation parameters (duration, format, sample rates, channels, batch size), and example CLI invocations with and without text prompts.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~5 minutes

Poem

🐰 A fluffy tale of audio streams,
Where vLLM serves our profiling dreams,
With synthetic voices, Docker's might,
We profile the audio, oh what delight!
Documentation hopping, clear and bright! 🎵

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat: Add docs for audio models' clearly and concisely describes the main change: adding documentation for profiling audio models, which matches the changeset.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@docs/tutorials/audio.md`:
- Around line 18-32: Update the vLLM invocation lines to use a valid
--limit-mm-per-prompt syntax: replace the invalid `--limit-mm-per-prompt
audio=2` usage in both the `vllm serve` command and the `docker run ... --model`
invocation with either JSON form `--limit-mm-per-prompt '{"audio": 2}'` or
dotted form `--limit-mm-per-prompt.audio 2`; ensure the change is applied to the
`vllm serve Qwen/Qwen2-Audio-7B-Instruct` example and the `docker run ...
vllm/vllm-openai:latest --model Qwen/Qwen2-Audio-7B-Instruct` example so the
`--limit-mm-per-prompt` flag is syntactically correct.
🧹 Nitpick comments (1)
docs/tutorials/audio.md (1)

87-97: Clarify list parameter usage in examples.

The documentation describes --audio-sample-rates and --audio-depths as lists to "randomly select from," but the examples (lines 62, 79) only show single values (e.g., --audio-sample-rates 16). Consider adding a brief note or example showing how to pass multiple values, or clarify that single values are also accepted.

📝 Suggested clarification
 - `--audio-sample-rates`: List of sample rates in kHz to randomly select from (default: 16)
+  - Example: `--audio-sample-rates 16` (single value) or `--audio-sample-rates 16 24 48` (multiple values)
 - `--audio-depths`: List of bit depths to randomly select from (default: 16)
+  - Example: `--audio-depths 16` (single value) or `--audio-depths 16 24` (multiple values)
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ae81346 and ec98485.

📒 Files selected for processing (1)
  • docs/tutorials/audio.md
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: ajcasagrande
Repo: ai-dynamo/aiperf PR: 389
File: src/aiperf/endpoints/openai_chat.py:41-46
Timestamp: 2025-10-23T03:16:02.685Z
Learning: In the aiperf project, the ChatEndpoint at src/aiperf/endpoints/openai_chat.py supports video inputs (supports_videos=True) through custom extensions, even though the standard OpenAI /v1/chat/completions API does not natively support raw video inputs.
📚 Learning: 2025-10-23T03:16:02.685Z
Learnt from: ajcasagrande
Repo: ai-dynamo/aiperf PR: 389
File: src/aiperf/endpoints/openai_chat.py:41-46
Timestamp: 2025-10-23T03:16:02.685Z
Learning: In the aiperf project, the ChatEndpoint at src/aiperf/endpoints/openai_chat.py supports video inputs (supports_videos=True) through custom extensions, even though the standard OpenAI /v1/chat/completions API does not natively support raw video inputs.

Applied to files:

  • docs/tutorials/audio.md
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
  • GitHub Check: build (macos-latest, 3.12)
  • GitHub Check: build (ubuntu-latest, 3.13)
  • GitHub Check: integration-tests (ubuntu-latest, 3.13)
  • GitHub Check: build (macos-latest, 3.10)
  • GitHub Check: integration-tests (ubuntu-latest, 3.10)
  • GitHub Check: build (ubuntu-latest, 3.11)
  • GitHub Check: build (ubuntu-latest, 3.10)
  • GitHub Check: build (macos-latest, 3.13)
  • GitHub Check: build (ubuntu-latest, 3.12)
  • GitHub Check: build (macos-latest, 3.11)
  • GitHub Check: integration-tests (ubuntu-latest, 3.12)
  • GitHub Check: integration-tests (ubuntu-latest, 3.11)
🔇 Additional comments (4)
docs/tutorials/audio.md (4)

1-4: LGTM!

Copyright header is properly formatted with the correct year and standard SPDX identifiers.


6-11: LGTM!

The introduction clearly states the purpose and scope of the tutorial.


69-85: LGTM!

The example effectively demonstrates combining audio inputs with text prompts using the --synthetic-input-tokens-mean flag.


56-67: No action required. The --endpoint-type chat fully supports audio inputs as a documented feature in AIPerf's ChatEndpoint implementation.

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.

@codecov
Copy link

codecov bot commented Jan 16, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

Signed-off-by: lkomali <lkomali@nvidia.com>
Signed-off-by: lkomali <lkomali@nvidia.com>
Signed-off-by: lkomali <lkomali@nvidia.com>
Copy link
Contributor

@ajcasagrande ajcasagrande left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks good. Now that we support loading media from files we may want to mention that.

Comment on lines +72 to +74
{"texts": ["Transcribe this audio."], "audios": ["wav,UklGRiIFAABXQVZFZm10IBAAAAABAAEAgD4AAAB9AAACABAAZGF0Yf4EAAD..."]}
{"texts": ["What is being said in this recording?"], "audios": ["mp3,SUQzBAAAAAAAI1RTU0UAAAAPAAADTGF2ZjU4Ljc2LjEwMAAAAAAAAAAA..."]}
{"texts": ["Summarize the main points from this audio."], "audios": ["wav,UklGRooGAABXQVZFZm10IBAAAAABAAEAgD4AAAB9AAACABAAZGF0YWY..."]}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the ... are just placeholders because the data is long right? might want to mention that

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we actually support loading audio from file and converting to base64 automatically, may want to include that, or just change to that. though idk how that would work with the CI

@lkomali lkomali marked this pull request as draft January 23, 2026 17:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants